Bow-tie structures of twitter discursive communities

Bow-tie structures were introduced to describe the World Wide Web (WWW): in the direct network in which the nodes are the websites and the edges are the hyperlinks connecting them, the greatest number of nodes takes part to a bow-tie, i.e. a Weakly Connected Component (WCC) composed of 3 main sectors: IN, OUT and SCC. SCC is the main Strongly Connected Component of WCC, i.e. the greatest subgraph in which each node is reachable by any other one. The IN and OUT sectors are the set of nodes not included in SCC that, respectively, can access and are accessible to nodes in SCC. In the WWW, the greatest part of the websites can be found in the SCC, while the search engines belong to IN and the authorities, as Wikipedia, are in OUT. In the analysis of Twitter debate, the recent literature focused on discursive communities, i.e. clusters of accounts interacting among themselves via retweets. In the present work, we studied discursive communities in 8 different thematic Twitter datasets in various languages. Surprisingly, we observed that almost all discursive communities therein display a bow-tie structure during political or societal debates. Instead, they are absent when the argument of the discussion is different as sport events, as in the case of Euro2020 Turkish and Italian datasets. We furthermore analysed the quality of the content created in the various sectors of the different discursive communities, using the domain annotation from the fact-checking website Newsguard: we observe that, when the discursive community is affected by m/disinformation, the content with the lowest quality is the one produced and shared in SCC and, in particular, a strong incidence of low- or non-reputable messages is present in the flow of retweets between the SCC and the OUT sectors. In this sense, in discursive communities affected by m/disinformation, the greatest part of the accounts has access to a great variety of contents, but whose quality is, in general, quite low; such a situation perfectly describes the phenomenon of infodemic, i.e. the access to “an excessive amount of information about a problem, which makes it difficult to identify a solution”, according to WHO.


German Covid-19 dataset
The German Covid-19 dataset contains 1,552,106 tweets shared between February 2 and April 23, 2020. we identified the following discursive communities: • AfD: this group contains accounts of politicians of the German nationalist and right-wing party "Alternative for Germany (AfD)"; • LEFT-WING: this community collects politicians of various German left-wing parties, as the "Social Democratic Party (SPD)", "Alliance 90/The Greens" and "Die Linke" (literally, "the left"); • GOVERNMENT: in this community are placed official accounts of German ministries and institutions as the Foreign, Defense or Health Ministries. It also contains politicians of the "Christian Democratic Union of Germany (CDU)"; • MEDIA: it includes the official accounts of the main German newspapers, blogs, TV-channels, journalists and other media in general. Table 1 and the bar chart in Fig. 1 show the dimension of each discursive community. As for the other Covid-19 datasets, the MEDIA group results the most numerous one, with approximately 70% of the nodes of the entire network. Fig. 2 shows the bow-tie structures for the four discursive communities. As in the main text, the dimension of the  sectors is proportional to the number of nodes and the color indicates the p-value encoding the mismatch with the predictions of the Direct Configuration Model (described in the Methods section of the main text). The AfD, GOVERNMENT and MEDIA groups display informative bow-tie structures; all of them are OUT-dominant, but only AfD has a strong bow-tie. In the LEFT-WING community the bow-tie is uninformative, with approximately 60% of the vertices in the OTHERS sector. In agreement with the results of the Italian Covid-19 dataset, the OTHERS block results significantly less numerous for the three communities with an informative bow-tie, and not for the LEFT-WING. Also in this dataset, the AfD discursive community, which contains right-oriented and conservatives accounts, shows a more numerous and denser SCC. It is the only community with above 10% of the nodes and 25% of the links within the SCC, in which each vertex has over 20 links on average attached to it (see Fig. 3). The accounts in the AfD discursive community are those who retweets the most urls of web-pages indicated by Newsguard as untrustworthy. Indeed, we found approximately 3,500 retweets of this type in AfD, about 200 in MEDIA, 20 in LEFT-WING and even none in GOVERNMENT. For AfD, 30% of them originates from the SCC and ends in the OUT sector, 25% between IN and OUT and 20% remains in the SCC. Therefore, in over 50% of the cases an user shares untrustworthy contents from the SCC.

French Covid-19 dataset
The French Covid-19 dataset consists in 3,052,708 posts published between March 23 and April 7 about the epidemic. We identified 4 different discursive communities: • RIGHT-WING: it collects conservatives and right-oriented accounts from French parties like "Rassemblement National", "Les Républicains" and "Les Identitaires"; • LEFT-WING: in this community there are the politicians and the supporters of center-left French parties like "La France Insoumise" or the socialists party ("Parti Socialiste"); • GOVERNMENT: it collects accounts of institutions and ministries like the official account of the French government or that of "Ministère des solidarités et de la santé" (Ministry of Solidarity and Health). It also contains politicians from the party "La République En Marche", whose leader is president Macron; • MEDIA: this is the usual community containing official accounts of various media and journalists.
As for the other Covid-19 datasets, the MEDIA community results the most numerous one, with approximately 60% of the nodes of the network (Fig. 4). Tab. 2 shows the communities' dimensions. For this dataset we could not make the comparisons with the predictions of the entropy-based null-model because of the huge dimension of its discursive communities. For these groups, the computation time for generating a sample of the graph ensemble and analysing their bow-tie structure became too long. Therefore, in the following plots there will be no information about the p-values. Fig. 5 shows that each discursive community displays an informative bow-tie structure. Remarkably, each of them are OUT-dominant ones, with no less of 40% of the nodes in every OUT sector. The mismatch in the number of nodes and links in the SCC between the right-wing community and the others still holds, but at much less extent, see Fig. 6. When matching Newsguard's data with Twitter's ones, we get results similar to the ones observed in the other datasets: we found 979 retweets with urls to untrustworthy web-pages in the right-wing Figure 2. The bow-tie structure of the discursive communities of the German Covid-19 dataset. The dimension of the sectors is proportional to the number of nodes contained in them and the color indicates the distance between the observed and the predicted dimensions through ln(p − value). The AfD, Government and MEDIA groups display an informative bow-tie structure, i.e. the OTHERS sector is the represent less than 50% of the nodes. Considering the comparisons with the predictions of the Direct Configuration Model, the observed dimension for the OTHERS sector is significantly less numerous (considering a significance at 1%) for all the communities, apart for the LEFT-WING one.

Figure 3.
Percentage of nodes and edges in the SCC for the communities in the German Covid-19 dataset. As for the other datasets, in the German Covid-19 one the conservatives and right-oriented discursive community (AfD) has more numerous and denser SCCs, as it is displayed in the two top panels. In the bottom panel, it can be seen that also considering the fraction of links per node in the SCC, the AfD group results again the first one.    Table 3. Dimension of the discursive communities of the Dutch Elections dataset, before and after the label propagation procedure (i.e., considering verified accounts only and all users).
community, 103 in the left-wing and none in the others. For the former community 25% are located between SCC and OUT, 22% between IN and OUT, 14% in the SCC, 12% between IN and SCC and much less between the other sectors. In the case of the left wing, 45% of these retweets are located between SCC and OUT and 31% in the SCC.

Dutch elections dataset
The Dutch elections dataset consists in 1,002,499 tweets posted between February 2 and March 31 2021. In this case almost each discursive community has the name of a specific Dutch politcal party: "GroenLinks" (center-left, green), "Christian Democratic Appeal (CDA)" (center, Christian-democratic), "Democrats 66 (D66)" (center/center-left, liberal), "People's Party for Freedom and Democracy (VVD)" (conservative-liberal) and "Labour Party (PvdA)" (center-left, social-democratic). Then we have the CONSERVATIVES community which collects accounts from right-oriented parties like "Party for Freedom" or "Forum for Democracy" and the MEDIA & S.P., which is the usual MEDIA community with a couple of accounts belonging to the Dutch "Socialist Party". Fig. 7 and Table 3 show the dimension of these seven discursive communities. As it is possible to observe in Fig. 8, all the discursive communities show an informative bow-tie structure, with the only exception of VVD.

8/17
Figure 8. The bow-tie structure of the discursive communities of the Dutch elections dataset.
All the discursive communities in this dataset display a strong bow-tie structure, but the VVD one. The conservative and right-oriented discursive community (CONS.) has more numerous and denser SCCs, as it is displayed in the two top panels. In the bottom panel, it can be seen that also considering the fraction of links per node in the SCC, the CONS. group outperforms other communties.
The CONSERVATIVES group results again that community with the highest percentages of nodes and links within SCC (see Fig. 9). Indeed, SCC contains more than 40% of the links of the entire network. In this dataset are present just a few retweets containing urls to untrustworthy web-pages (Newsguard). However, they are all located in the CONSERVATIVES community: 153 in total, whose 55% in the SCC and 45% between SCC and OUT.

Italian debate on migrants dataset
This Italian dataset contains Twitter posts about the migration flows from Northern Africa. The dataset consists of 1,081,780 posts published between January 23, 2019 and February 22, 2019. Using Louvain communitiy detection algorithm (see the main text), the network was partitioned in the following communities: DX (right-oriented Italian parties as "Lega Nord"), CSX (left-oriented Italian parties as the Democratic Party and other minor center-left parties), M5S ("Five Star Movement" party) and the usual MEDIA community. The first two communities result the most numerous ones (see Fig. 10 and Table 4). In Fig. 11 there are the bow-tie structures for the four discursive communities in this dataset. The most numerous community of DX and CSX display informative bow-ties while in the two smaller ones the nodes are mostly located in the OTHERS sector, especially for the MEDIA community (above 95%). Looking to the colors in the graphs, in general, the latter ones result more in agreement with the Direct Configuration Model.
The DX community, which again contains politicians of right-oriented Italian parties, has the most numerous and denser SCC (Fig. 12), such that on average a node therein has over 25 links. Newsguard data tell us that 15,160 retweets in the DX network contain the urls of untrustworthy web-pages, while only 14 for the CSX, 3 for MEDIA and none for M5S. In the case of the DX community, 59% of them can be found inside the SCC and 36% between the SCC and OUT.  The bar chart displays the percentage of nodes in each discursive community. The DX and CSX groups result the most numerous ones, with between 30% and 50% of the nodes. Among the ones we analysed, this is the only dataset in which the percentage of nodes not assigned to a discursive community by the label propagation procedure overcome 15%. Again, the conservative and right-oriented discursive community (DX) has more numerous and denser SCCs, as it is displayed in the two top panels. In the bottom panel, it can be seen that also considering the fraction of links per node in the SCC, the DX group results again the first one. In this bar chart the percentage of nodes in each discursive community is displayed. All communities but MEDIA and PD display informative bow-ties, and, among them, the DX and IV ones are strong. The LEFT-WING COMMENTATORS group results the most numerous one, with over 60% of the nodes.

Italian debate on Astrazeneca vaccine
This dataset contains 583,236 Twitter posts published in Italy and regarding the discussion about the safety of Astrazeneca vaccine against Covid-19. The dataset contains posts shared between March 15, 2021 and May 15, 2021. we identified the following discursive communities: • DX: this is the usual right-oriented and conservatives community found even in the other Italian datasets, i.e. it contains accounts from the "Lega" and "Fratelli d'Italia" parties; • PD: the Italian Democratic Party (center-left); • IV: it collects the politicians of the "Italia Viva" party (center-left); • LEFT-WING COMMENTATORS: this particular community is formed by several well-known personalities, often left-oriented, which are not politicians but journalists, blogger, actors or entertainers. This community contains also the most famous Italian epidemiologist Roberto Burioni; • M5S: the Italian populist party "Movimento 5 Stelle"; • MEDIA: the usual community containing official accounts of newspaper, blog, TV-channels, radio and others.
The distribution of the nodes in these six communities is showed in Fig. 13 and the related dimensions are shown in Table 5. The two biggest communities, DX and LEFT-WING COMMENTATORS, show a respectively strong and weak bow-tie structures (Fig. 14), denoting, once more, that the strength of the structure does not depend on its dimension. Nevertheless, thery are both OUT-dominant. In the M5S and IV ones there is a nearly balanced situation between INTENDRILS and OUT as the dominant sector. While MEDIA bow-tie is poorly informative, the PD community is not informative, with over 50% of the nodes in the OTHERS.  Table 5. Dimension of the discursive communities in the dataset on the Italian debate on Astrazeneca vaccine, before and after the label propagation procedure (i.e., considering verified accounts only or all accounts).
The DX community results again the community with the most numerous and denser strongly connected component; in DX community, the average degree of nodes in SCC is greater than 27 links, while in the other communities there are always less than 10 links per node (Fig. 15). According to Newsguard, 728 retweets in DX community contained urls to untrustworthy pages. They are distributed as follow: 43% between SCC and OUT, 26% in the SCC, 13% between IN and SCC, 12% between IN and OUT and much less between the other sectors. We found only two retweets of this type in the M5S community and none in the others. Figure 15. Percentage of nodes and edges in the SCC for the communities in the Italian debate on Astrazeneca vaccine dataset. Again, the conservative and right-oriented discursive community (DX) has more numerous and denser SCCs, as it is displayed in the two top panels. In the bottom panel, it can be seen that also considering the fraction of links per node in the SCC, the DX group results again the first one.